There are two possible outcomes:
These will be analyzed as separate response variables, since it is possible to have one and not the other. Thus, we’ll have two models to report results on in the end:
\[ Glanced = \beta_1*Error Type + \beta_2*Sign Position + \beta_3*Age + \beta_4*Gender + \epsilon \]
And
\[ Called = \beta_1*Error Type + \beta_2*Sign Position + \beta_3*Age + \beta_4*Gender + \epsilon \] These will be logistic models, since the outcome variable in each case is binary. We’ll use model selection to decide if the mediating variables of age and gender are useful. In addition, we may consider the order that the scenarios were run as well, namely ‘first parallel’ and ‘first perpendicular’.
From study metadata:
| Variable | Explanation |
|---|---|
| Participant ID | 1 thru 48 |
| Age | Participants Age |
| Gender | Participants Gender (M, F, TGNC) |
| Crossing ID | Run 1 = 1 (parallel), Run 2 = 2 (parallel), Run 3 = 3 (Perpendicular), Run 4 = 4 (Perpendicular), Run 5 = 5(Stuck on Track) |
| Error Type | Type of Malfunction (Gate, No Train or Train, No Gate) |
| Sign Position | Position if ENS (Parallel or Perpendicular) |
| Glanced | 0 = no, 1 = Yes |
| Called | 0 = no, 1 = Yes |
Quick exploratory data analysis to check for data completeness and understand distributions.
library(readxl)
library(tidyverse)
library(knitr)
library(kableExtra)
library(ggplot2)
library(plotly)
library(sjPlot)
d <- read_xlsx(file.path('Data', 'Analysis and Glance Score 3 17 20.xlsx'), sheet = 'Data')
# Filter out rows where glanced or called is NA
d <- d %>%
filter(!(is.na(Glanced) | is.na(Called)))
summ_tab <- d %>%
group_by(`Error Type`, `Sign Position`) %>%
summarize(count = n(),
Correct_Glance = sum(Glanced),
Correct_Called = sum(Called),
Pct_Correct_Glance = Correct_Glance / count,
Pct_Correct_Called = Correct_Called /count)
kable(summ_tab,
caption = 'Summary of data') %>%
kable_styling(bootstrap_options = c('striped','hover'))
| Error Type | Sign Position | count | Correct_Glance | Correct_Called | Pct_Correct_Glance | Pct_Correct_Called |
|---|---|---|---|---|---|---|
| Gate, No Train | parallel | 41 | 29 | 5 | 0.7073171 | 0.1219512 |
| Gate, No Train | perpendicular | 40 | 7 | 3 | 0.1750000 | 0.0750000 |
| Stuck | parallel | 20 | 2 | 2 | 0.1000000 | 0.1000000 |
| Stuck | perpendicular | 20 | 10 | 10 | 0.5000000 | 0.5000000 |
| Train, No Gate | parallel | 41 | 30 | 10 | 0.7317073 | 0.2439024 |
| Train, No Gate | perpendicular | 40 | 8 | 5 | 0.2000000 | 0.1250000 |
It seems clear that parallel is the better orientation for both the gate, no train and train, no gate conditions. However, when stuck on the track, it seems better for the sign to be perpendicular to the train track.
gp1 <-
ggplot(summ_tab, aes(x = `Error Type`, color = `Sign Position`)) +
geom_point(aes(y = Pct_Correct_Glance, size = count)) +
scale_size(range = c(3, 8)) +
ylim(c(0, 1)) +
theme_bw() +
ggtitle('Percent of correct glances by error type and sign position')
gp2 <-
ggplot(summ_tab, aes(x = `Error Type`, color = `Sign Position`)) +
geom_point(aes(y = Pct_Correct_Called, size = count)) +
scale_size(range = c(3, 8)) +
ylim(c(0, 1)) +
theme_bw() +
ggtitle('Percent of correct calling behavior by error type and sign position')
ggplotly(gp1)
ggplotly(gp2)
Taking a look at the gender differences:
summ_tab2 <- d %>%
group_by(Gender, `Error Type`) %>%
summarize(count = n(),
Correct_Glance = sum(Glanced),
Correct_Called = sum(Called),
Pct_Correct_Glance = Correct_Glance / count,
Pct_Correct_Called = Correct_Called /count)
gp3 <-
ggplot(summ_tab2, aes(x = `Error Type`, color = Gender)) +
geom_point(aes(y = Pct_Correct_Glance, size = count)) +
scale_size(range = c(3, 8)) +
ylim(c(0, 1)) +
theme_bw() +
ggtitle('Percent of correct glance behavior by error type and gender')
gp4 <-
ggplot(summ_tab2, aes(x = `Error Type`, color = Gender)) +
geom_point(aes(y = Pct_Correct_Called, size = count)) +
scale_size(range = c(3, 8)) +
ylim(c(0, 1)) +
theme_bw() +
ggtitle('Percent of correct calling behavior by error type and gender')
ggplotly(gp3)
ggplotly(gp4)
# Intercept = first alphabetical category of error type, namely gate, no train
gm1 <- glm(Glanced ~ `Error Type` + `Sign Position`,
family = 'binomial',
data = d)
# Now add interaction
gm2 <- glm(Glanced ~ `Error Type` * `Sign Position`,
family = 'binomial',
data = d)
AIC(gm1, gm2) # Second model wins
# Add age and gender to gm2.
gm3a <- glm(Glanced ~ `Error Type` * `Sign Position` + Age + Gender,
family = 'binomial',
data = d)
gm3b <- glm(Glanced ~ `Error Type` * `Sign Position` + Gender,
family = 'binomial',
data = d)
AIC(gm2, gm3a, gm3b) # Some improvment, best with age included
tab_model(gm3a)
| Glanced | |||
|---|---|---|---|
| Predictors | Odds Ratios | CI | p |
| (Intercept) | 9.74 | 2.63 – 40.21 | 0.001 |
Error TypeStuck
|
0.05 | 0.01 – 0.20 | <0.001 |
Error TypeTrain, No Gate
|
1.14 | 0.42 – 3.09 | 0.801 |
Sign Positionperpendicular
|
0.08 | 0.02 – 0.22 | <0.001 |
| Age | 0.98 | 0.95 – 1.00 | 0.072 |
| GenderM | 0.52 | 0.25 – 1.06 | 0.074 |
| GenderTGNC | 0.95 | 0.19 – 5.02 | 0.948 |
Error TypeStuck:Sign Positionperpendicular
|
96.69 | 13.82 – 961.21 | <0.001 |
Error TypeTrain, No Gate:Sign Positionperpendicular
|
1.04 | 0.23 – 4.82 | 0.955 |
| Observations | 202 | ||
| R2 Tjur | 0.316 | ||
Interpretation:
The manual calculations below show how the odds and odds ratios would be calculated for just the Error Type variable alone. The statistical model gm3 above accounts for all the variables. The example below shows how the intercept = the Odds of the first level of the categorical variable, while the other coefficients are the ratios of the odds of that category compared to the intercept.
d_sum <- d %>%
group_by(`Error Type`) %>%
summarize(count = n(),
Correct_Glance = sum(Glanced),
Pct_Correct_Glance = Correct_Glance / count,
Odds = ( Correct_Glance / (count - Correct_Glance) )
)
kable(d_sum, caption = 'Example calculations of odds') %>%
kable_styling(bootstrap_options = c('striped','hover'))
| Error Type | count | Correct_Glance | Pct_Correct_Glance | Odds |
|---|---|---|---|---|
| Gate, No Train | 81 | 36 | 0.4444444 | 0.8000000 |
| Stuck | 40 | 12 | 0.3000000 | 0.4285714 |
| Train, No Gate | 81 | 38 | 0.4691358 | 0.8837209 |
d_sum$Odds[2] / d_sum$Odds[1]
## [1] 0.5357143
d_sum$Odds[3] / d_sum$Odds[1]
## [1] 1.104651
# Compare to a model.
gm_ex <- glm(Glanced ~ `Error Type`,
family = 'binomial',
data = d)
tab_model(gm_ex)
| Glanced | |||
|---|---|---|---|
| Predictors | Odds Ratios | CI | p |
| (Intercept) | 0.80 | 0.51 – 1.24 | 0.318 |
Error TypeStuck
|
0.54 | 0.23 – 1.18 | 0.129 |
Error TypeTrain, No Gate
|
1.10 | 0.59 – 2.05 | 0.752 |
| Observations | 202 | ||
| R2 Tjur | 0.016 | ||
# Intercept = first alphabetical category of error type, namely gate, no train
cm1 <- glm(Called ~ `Error Type` + `Sign Position`,
family = 'binomial',
data = d)
# Now add interaction
cm2 <- glm(Called ~ `Error Type` * `Sign Position`,
family = 'binomial',
data = d)
AIC(cm1, cm2) # Second model wins
# Add age and gender to cm2
cm3a <- glm(Called ~ `Error Type` * `Sign Position` + Age + Gender,
family = 'binomial',
data = d)
# Without age
cm3b <- glm(Called ~ `Error Type` * `Sign Position` + Gender,
family = 'binomial',
data = d)
AIC(cm2, cm3a, cm3b) # cm3b best model (without age)
tab_model(cm3b)
| Called | |||
|---|---|---|---|
| Predictors | Odds Ratios | CI | p |
| (Intercept) | 0.25 | 0.08 – 0.64 | 0.007 |
Error TypeStuck
|
0.81 | 0.10 – 4.39 | 0.814 |
Error TypeTrain, No Gate
|
2.49 | 0.76 – 9.11 | 0.144 |
Sign Positionperpendicular
|
0.58 | 0.11 – 2.63 | 0.487 |
| GenderM | 0.23 | 0.09 – 0.55 | 0.001 |
| GenderTGNC | 1.52 | 0.30 – 6.76 | 0.591 |
Error TypeStuck:Sign Positionperpendicular
|
20.02 | 2.09 – 261.83 | 0.013 |
Error TypeTrain, No Gate:Sign Positionperpendicular
|
0.73 | 0.10 – 5.54 | 0.751 |
| Observations | 202 | ||
| R2 Tjur | 0.164 | ||
Interpretation: